Factor Analysis in Data Mining

نویسندگان

  • Richard L. Peterson
  • Chen-Fu Chien
  • Ruben Xing
چکیده

The rapid growth and advances of information technology enable data to be accumulated faster and in much larger quantities (i.e., data warehousing). Faced with vast new information resources, scientists, engineers, and business people need efficient analytical techniques to extract useful information and effectively uncover new, valuable knowledge patterns. Data preparation is the beginning activity of exploring for potentially useful information. However, there may be redundant dimensions (i.e., variables) in the data, even after the data are well prepared. In this case, the performance of data-mining methods will be affected negatively by this redundancy. Factor Analysis (FA) is known to be a commonly used method, among others, to reduce data dimensions to a small number of substantial characteristics. FA is a statistical technique used to find an underlying structure in a set of measured variables. FA proceeds with finding new independent variables (factors) that describe the patterns of relationships among original dependent variables. With FA, a data miner can determine whether or not some variables should be grouped as a distinguishing factor, based on how these variables are related. Thus, the number of factors will be smaller than the number of original variables in the data, enhancing the performance of the data-mining task. In addition, the factors may be able to reveal underlying attributes that cannot be observed or interpreted explicitly so that, in effect, a reconstructed version of the data is created and used to make hypothesized conclusions. In general, FA is used with many data-mining methods (e.g., neural network, clustering). BACKGROUND

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Separation of Geochemical Anomalies Using Factor Analysis and Concentration-Number (C-N) Fractal Modeling Based on Stream Sediments Data in Esfordi 1:100000 Sheet, Central Iran

The aim of this study is separation of Fe2O3, TiO2 and V2O5 anomalies in Esfordi 1:100,000 sheet which is located in Bafq district, Central Iran. The analyzed elements of stream sediment samples taken in the area can be classified into 5 groups (factors) by factor analysis. The Concentration–Number (C-N) fractal model was used for delineation of the Fe2O3, TiO2 and V2O5 thresholds. According to...

متن کامل

Determination of geochemical anomalies and gold mineralized stages based on litho-geochemical data for Zarshuran Carlin-like gold deposit (NW Iran) utilizing multi-fractal modeling and stepwise factor analysis

The Zarshuran Carlin-like gold deposit is located at the Takab Metallogenic belt in the northern part of the Sanandaj-Sirjan zone, NW Iran. The high-grade ore bodies are mainly hosted by black shale and cream to gray massive limestone along the NNE-trending extensional fault/fracture zones. The aim of this investigation was to determine and separate the gold mineralized stages based on the surf...

متن کامل

New Approaches to Analyze Gasoline Rationing

In this paper, the relation among factors in the road transportation sector from March, 2005 to March, 2011 is analyzed. Most of the previous studies have economical point of view on gasoline consumption. Here, a new approach is proposed in which different data mining techniques are used to extract meaningful relations between the aforementioned factors. The main and dependent factor is gasolin...

متن کامل

The Use of Robust Factor Analysis of Compositional Geochemical Data for the Recognition of the Target Area in Khusf 1:100000 Sheet, South Khorasan, Iran

The closed nature of geochemical data has been proven in many studies. Compositional data have special properties that mean that standard statistical methods cannot be used to analyse them. These data imply a particular geometry called Aitchison geometry in the simplex space. For analysis, the dataset must first be opened by the various transformations provided. One of the most popular of the a...

متن کامل

Detection of Main Rock Type for Rare Earth Elements (REEs) Mineralization Using Staged Factor and Fractal Analysis in Gazestan Iron-Apatite Deposit, Central Iran

Gazestan magnetite-apatite deposit is located in Central Iran and Bafq region, which has been occurred in form of veins, veinlets, and small apatite lenses as well as magnetite in metasomatic rock types such as green chlorite-actinolite rock units. These rocks are situated in the carbonate-volcanic complex of Upper Precambrian-Lower Cambrian Rizo formation. In this study, staged factor analysis...

متن کامل

Application of continuous restricted Boltzmann machine to detect multivariate anomalies from stream sediment geochemical data, Korit, East of Iran

Anomaly separation using stream sediment geochemical data has an essential role in regional exploration. Many different techniques have been proposed to distinguish anomalous from study area. In this research, a continuous restricted Boltzmann machine (CRBM), which is a generative stochastic artificial neural network, was used to recognize the mineral potential area in Korit 1:100000 sheet, loc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015